On a Kernel Regression Approach to Machine Translation

Size: px
Start display at page:

Download "On a Kernel Regression Approach to Machine Translation"

Transcription

1 On a Kernel Regression Approach to Machine Translation Nicolás Serrano, Jesús Andrés-Ferrer, and Francisco Casacuberta Instituto Tecnológico de Informática {nserrano,jandres,fcn}@iti.upv.es Abstract. We present a machine translation framework based on Kernel Regression techniques. The translation process is modeled as a stringto-string mapping. For doing so, first both source and target strings are mapped to a natural vector space obtaining feature vectors. Afterwards, a translation mapping is defined from the source feature vector to the target feature vector. This translation mapping is learnt by linear regression. Once the target feature vector is obtained, we use a multi-graph search to find all the possible target strings whose mappings correspond to the translated feature vector. We present experiments in a small but relevant task showing encouraging results. 1 Introduction Machine translation deals with the problem of mapping sentences x from a source language X to a target language Y. Due to the complexity of the translation problem, the relationships between these two languages cannot be properly enumerated as a set of rules. Consequently, several authors have approached this problem as a statistical pattern recognition problem [1]. However, we present a new method for machine translation based on the work in [2], where a regression framework for learning transductions is proposed. Similar methods have been applied to many fields such as text categorization [4] or machine translation [5]. In our approach, both source and target strings are mapped to natural feature vectors. Then a translation mapping is learnt between feature vector domains. Given a source string x, the translation process consists in mapping it to a feature vector u and then, mapping this vector u to its corresponding target feature vector v. The latter mapping, the so-called translation mapping is learnt by regression techniques. Finally, the pre-image set for the target feature vector v must be found. This problem is referred as the Pre-image problem. The focus of this paper is to solve this problem in a regression-based machine translation framework. Some previous works such as [5], have explore the idea of learning the translation as a regression problem. These works do not handle the pre-image problem directly but use the model as a score to the standard statistical machine translation systems. Specifically, they use a phrased-based statistical machine translation model [6] and use the kernel regression model as score to the phrased-based

2 2 N. Serrano, J. Andres-Ferrer and F. Casacuberta search. This approach does not make the best of the regression approach, as proposed in [2], losing some of its main advantages. On the contrary, the pre-image search proposed in [2] is adapted in this work to the peculiarities in the machine translation problem. This aim is achieved by building the DeBruijn Graph [7] for a target feature vector and then finding eulerian paths within it. Nevertheless, due to the high dimensionality of the feature vectors, severe problems arise. 2 The Training Process The aim of machine translation is to learn a mapping from a source language X to a target language Y, i.e. f : X Y, where X is the source vocabulary and Y is the target vocabulary. In statistical machine translation the optimal translation function f is designed based on Bayes decision theory [1] where p(y x) is approximated by f(x) = arg maxp(y x) (1) y Y exp(c(x, y)) p(y x) = y Y exp(c(x, y )) (2) since the exponential function is monotonically increasing, Eq. (1) simplifies to f(x) = argmaxc(x, y) (3) y Y where C(x, y) is a score function, which is modelled by a feature set {h k } K 1 of theoretically and heuristically motivated functions [8] C(x, y) = k λ k h k (x, y) (4) where λ k are the model parameters that weight the feature function relevance. In state-of-art translation systems [9], these feature functions are mainly modelled by logarithms of probability models, such as the phrase-based models [6]. However, we propose a method for learning the translation function f based on regression techniques. In order to configure the regression framework, the source strings x X are mapped to the natural domain u U N D1 via a source mapping function φ X, i.e. u = φ X (x). Similarly, the target strings y Y are mapped to another natural domain, v V N D2, via a target mapping function φ Y, i.e. φ Y (y). Although both source and target mappings are not required to be of the same type, we will henceforth assume so. Then, we define the mappings, φ X and φ Y, as the function, φ n, that generates a n- gram count vector from a given string, x and y respectively. More accurately, φ n (x) = { x u1,..., x un } with u i standing for the i-th n-gram in lexicografic

3 On a Kernel Regression Approach to Machine Translation 3 order of the vocabulary X, and x u is the number of occurrences of u in x. For instance, the string x = aaabb of the language X = {a, b} will correspond to the bigram mapping output u = φ 2 (x) = (2, 1, 0, 1). The feature vector mapping u U N D1 is useful when comparing strings, since it helps to define a kernel between strings as follows: K n (x, x ) = φ n (x)φ n (x ) T = u Σ n x u x u (5) where K n (x, x ) ranges from the number of common n-grams if both strings are equal, to zero if they totally differ. Once the string-to-vector mapping is defined, the training problem is restated as a regression problem where a source-to-target mapping must to be found, i.e. finding the translation mapping h : U V. Given such a mapping h and two sentences, a source sentence x and its translation y; we define a string to target feature mapping, g : X V, as follows g(x) = h(φ X (x)) = φ Y (y) (6) Given a source string x, the proposed translation method consists in mapping it to U via φ X (x) and then, mapping it to V with the translation function h(x). Afterwards, given the target feature vector v obtained from the translation mapping, we compute its pre-image set φ 1 Y (v). Figure 1 depicts a general scheme of the whole translation process. X φ X( ) U f( ) g( ) h( ) Y φ Y ( ) φ 1 Y ( ) V Fig. 1. Machine translation kernel regression scheme 2.1 The Linear Regression The function h maps between two natural domains h : N D1 N D2. Since discrete regression problems yield complicated training algorithms, we learn an extension of this function h that approximates the natural domains by real domains, i.e. h : R D1 R D2. We further assume that our regression problem is linear, that is to say, that the mapping function h(u), and hereby its extension h, can be aproximated with a linear function, i.e. h(u) = Wu. Note that in this case the string-to-feature vector mapping g, is simplified to g(x) = Wφ X (x). Given a set of sentences, (x m, y m ) M 1, we can learn the optimal Ŵ matrix using

4 4 N. Serrano, J. Andres-Ferrer and F. Casacuberta the regularised least square as follows Ŵ = arg min W M Wφ X (x m ) φ Y (y m ) 2 + γ W 2 F (7) m=1 where F refers to the Frobenius norm and where γ > 0 is the regularization term. The aim of the regularisation term is to avoid the weights matrix W to contain large weights, which is a clear evidence of overtraining. The solution to Eq. (7) is unique and is found by differentiating the expression and equaling it to zero Ŵ = M Y (K X + γi) 1 M T X (8) where M Y = [φ Y (y 1 ),..., φ Y (y M )] is the D 2 M matrix of which j-th column vector is φ Y (y j ), where M X = [φ X (x 1 ),..., φ X (x M )] is the analogous for the source strings, and where K X is the Gram matrix associated to the kernel K n applied to the source samples, i.e. [K X ] ij = K n (x i, x j ) for all i, j such that 1 i, j M. 3 The decoding Problem In the proposed framework, the regression output for a given source string x, is a feature vector v which represents the occurrence counts of each target n-gram in its respective target translation y. Any target string y, that produces the same count vector v by means of the target mapping φ Y (y) is regarded as a possible translation of the source sentence x. Therefore, the decoding problem or the pre-image problem is theoretically constrained to a subdomain of Y. The decoding or pre-image problem is stated as the problem of finding which target sentences are represented as given a feature vector, i.e., to find the set (g(x)). The pre-image of g(x) is not unique since each reordering of the same counts leads to the same target feature vector v. To further understand the problem we give a simple example. We assume the target language in the example is given by Y = {a, b} and that the mapping function is φ 2, i.e. counting the number of possible bigrams. This implies that the count vector has four dimensions, one for each of the possible bigrams {aa, ab, ba, bb}. Since the dimensions are sorted in lexicographic order, the first dimension v 1 represents the occurrences of the bigram aa, the second dimension v 2, the occurrences of ab; and so on. If the regression output for a source sentence x is v = g(x) = {2, 1, 1, 0}, φ 1 Y then its pre-image set is φ 1 Y (v) = {aaaba, aabaa, baaab}. When dealing with natural feature vectors the pre-image problem has a wellknown solution [7]. First, it is needed to build the so-called DeBruijn graph as follows: all the (n 1)-gram sequences represent a different node in the graph, and edges are added going from a node, a 1 a 2... a n 1, to a node, a 2... a n 1 a n, if they have a positive weight, which is the count of the n-gram a 1... a n in v. In

5 On a Kernel Regression Approach to Machine Translation 5 2 a 1 1 b Fig.2. A DeBruijn graph for bigrams and the feature vector v = {2, 1,1, 0}. this graph, each Eulerian path 1 corresponds to a target string in the pre-image set. The DeBruijn graph of the proposed example is shown Fig. 2. At this point several problems arise in practice: The translation regresion is real-valued h instead of natural-valued h. Eulerian paths are not correctly defined in this case, since the DeBruijn technique cannot be directly applied in real-valued feature vectors. There are unknown source and target n-grams, those not appearing at the training samples. This makes the target feature vector v not to define a unique connected DeBruijn graph, but one with multiple isolated connected components. Even if we obtain all the possible eulerian paths within the graph, that represents all the possible translation, there is no way to select the proper translation corresponding to the source string. 3.1 The Aliasing Problem The result obtained from the regression is a real-valued g function where the meaning of each of the target vector dimensions v k is not clear. For instance, we take the example in which the target vocabulary is given by Y = {a, b} and a bigram mapping is used φ 2 ( ). In this example, the obtained target feature vector for a given source string could be v = (1.2, 0.5, 0.8, 0.25) instead of the perfect regression v = (2, 1, 1, 0). For deeply understanding the implication of any approximation, we must analyse the effect of variations in the natural-valued feature vector. Assuming that the correct regresion is given by v = (2, 1, 1, 0) then the pre-image set is φ 1 (v) = {abaa, aaba, baaab}. In the case in which the regresion produces v = (2, 1, 1, 1), the pre-image set changes to φ 1 (v) = {abbaa, aabba, baaabb, bbaaab}. Therefore, adding (or subtracting) 1 to any count dimension incurs in one extra (or less) word. The search method originally proposed in [2] is to round the real vector and then, build a DeBruijn graph from it and search eulerian paths whittin the graph. However, the best solution takes into account the previously discussed regression errors. For this aim we define the Levensthein loan 2 as the real increment or 1 A path inside a graph visiting each edge the number of times indicated by its weight 2 Named after the Levensthein distance [10]

6 6 N. Serrano, J. Andres-Ferrer and F. Casacuberta decrement that must be added to a given real vector in order to convert it to the natural domain. The Levensthein loan can be understood as the average Levensthein error of the correct hypothesis with respect to unknown reference. In order to find the pre-image for real-valued feature vectors, we build a weighted graph in a similar way the DeBruijn graph is built. The edges represent the real count of each n-gram according to the real-valued feature vector, instead of actual natural counts. During the search, the Levensthein loan is used to add or substract any necessary amount to the eulerian path simulating, in this way, a natural vector. In summary, the search algorithm looks for a path that uses the more of the weights in the graph and ask for the lowest possible loan. 3.2 The Unknown n-grams Problem In practice, it is common to find unknown n-grams when mapping strings to the feature vector count domain. This problem is stressed as the n becomes larger. Usually, this leads to DeBruijn graphs with isolated connected components and without eulerian paths covering the full sentence. A way to amend the problem is to apply backing-off to lower order models during the search. That is to say, when searching for a path, the search is simultaneously performed in different l-gram graphs (1 l n). Higher values of l are more restrictive during the search and also capture more structural dependencies but more number of parameters must be estimated in the regression. When the search at the highest order graph cannot continue building an eulerian path, the search backs off one order level with a penalisation factor, similarly to backing-off smoothing in language modeling [11]. This process goes on recursively until the unigram level if needed. 3.3 Adding Scores to the Search Lenvensthein loan score is not enough to select just one target string from the real-valued feature vector. Recall that even in the theoretical situation this would not be possible since several sentences can be built from different reordering of the same n-gram counts. These sentences have the same loan and therefore, we have no way to discriminate among them. Obviously, this problem is solved by adding an additional score to the target string y and consequently to each path in the pre-image search. Although, several scores can be proposed such as the probability given by statistical translations models, we have adopted in this first work a language model probability, specifically the n-gram language model [11]. In summary, the score in Eq. (3) for a given pair (x, y) is defined as follows for the proposed kernel regression model ( N ) C(x, y) = λ i sc i (x, y) + λ lm p n (y) + λ l exp ( y α) (9) i=1 where sc i is the Levensthein loan for the target string y at the i-th level graph; p n (y) is the language probability model for the target sentence y, and the last

7 On a Kernel Regression Approach to Machine Translation 7 term represents a word-bonus score to counteract the language model bias towards short sentences. 4 Experimental Results We have carried out experiments on the categorized EuTrans corpus [12]. This corpus comprises tourist requests at hotel front desks. Categorized EuTrans consists in pair of sentences divided into three sets: 9000 sentences to train, 1000 sentences were left out for validation and 3000 sentences for testing. There are 420 source and 230 target categories. The corpora perplexity is 3.95 and 4.73 for the target and source language respectively. Three kernel n-gram models where trained: one built from bigrams consisting on 2145 source bigrams and 929 target bigrams, another built from trigrams consisting on 5048 source trigrams and 2102 target trigrams; and the concatenation of both as described in 3.2. We trained increasing sizes of n-gram language models from 2 to 5 estimated by the SRILM toolkit [13]. The score weights, λ, in Eq. (9) were adjusted by the Downhill Simplex algorithm [14] for optimizing the BLEU score [15] on the validation set. The proposed system was compared with the Moses [9] baseline system, in which we have limited the phrase length to 7 allowing reordering and optimizing the parameters on a validation set. The Table 1. Results on categorized EuTrans. N stands for the order of the kernel whilst the LM order row stands for the order of the n-gram language model (0 no LM). Model N = 2 N = 3 N = 3 + (N = 2) Backoff LM order BLEU Moses baseline system scored 92.3 points of BLEU compared to the 95.5 points of our best system. A practical behaviour analysis of the proposed method is enclosed in the table 1. The results on this simple task are encouraging since almost all results surpass the baseline system. It can be observed that adding language information considerably improves the bigram system. However the trigram system is not benefited from the language model information, probably because of the corpus simplicity. Finally as expected, the search smoothing with back-off improves the results. 5 Conclusions and Future Work Kernel regression models are a new and encouraging approach to the machine translation field. In this work we have proposed a complete regression framework to machine translation by proposing a regression-based search. In contrast, other works perform the search by means of a phrase-based decoding. In addition, we

8 8 N. Serrano, J. Andres-Ferrer and F. Casacuberta have explored the idea of adding language information to rank the target strings among all the pre-image set. In a future work, different models apart from the n-gram language model will be added to rank the target strings, such as IBM word models and phrased-based models. Nevertheless several problems arise when complex corpus are used. To deal with them, further optimizations such as perform the model estimation through Cholesky incomplete decomposition or subset selection techniques are left for further research. Other possible improvements in the process would be using different kernel functions for comparing strings and use quadratic regression instead of linear regression as the regression model. Acknowledgments This work has been supported by the Spanish MEC under FPU scholarship AP , the valencian Conselleria d Empresa i Cincia grant CTBPRA/2005/004, grant Consolider Ingenio 2010 CSD by the EC (FEDER) and the Spanish MEC under grant TIN CO2-01. References 1. P.F. Brown et al. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistic. 19(2). pp (1993) 2. C. Cortes, M. Mohri, and J. Weston. A general regression technique for learning transductions. Proc. of the 22nd international conference on Machine learning, C. Cortes, M. Mohri, and J. Weston. A General Regression Framework for Learning String-to-String Mappings. In Predicting Structured Data. The MIT Press, N. Cancedda, E. Gaussier, C. Goutte, J. Render: Word-Sequence Kernels. Journal of Machine Learning Research 3. pp (2003) 5. Z. Wang and J. Shawe-Taylor: Kernel Regression Based Machine Translation. NAACL HLT 2007, Companion Volume, pp P. Koehn, F.J. Och and D. Marcu. Statistical phrase based translation. In Proceedings of HLT/NACL. (2003) 7. J.L. Gross and J. Yellen. Handbook of Graph Theory. Published by CRC Press. Pp ISBN: Franz Josef Och, Hermann Ney. The alignment template approach to statistical machine translation. Computational Linguistics, vol. 30(4), pp P. Koehn. H. Hoang. A. Birch and C. Callison-Burch. Moses: Open Source Toolkit for Statistical Machine Translation. Proc. of ACL 07 pp V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, pp (1966) 11. J.T. Goodman.An empirical study of smoothing techniques for language modelling. In Proc. of ACL 96, pp F. Casacuberta et al. Some approaches to statistical and finite-state speech-tospeech tranlation. Computer Speech and Language 18. pp (2004) 13. SRILM - An Extensible Language Modeling Toolkit, in Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September J. A. Nelder and R. Mead A Simplex Method for Function Minimization The Computer Journal Vol 7, pp (1965) 15. Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation in ACL-2002: pp

RegMT System for Machine Translation, System Combination, and Evaluation

RegMT System for Machine Translation, System Combination, and Evaluation RegMT System for Machine Translation, System Combination, and Evaluation Ergun Biçici Koç University 34450 Sariyer, Istanbul, Turkey ebicici@ku.edu.tr Deniz Yuret Koç University 34450 Sariyer, Istanbul,

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

Word Graphs for Statistical Machine Translation

Word Graphs for Statistical Machine Translation Word Graphs for Statistical Machine Translation Richard Zens and Hermann Ney Chair of Computer Science VI RWTH Aachen University {zens,ney}@cs.rwth-aachen.de Abstract Word graphs have various applications

More information

Discriminative Training for Phrase-Based Machine Translation

Discriminative Training for Phrase-Based Machine Translation Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured

More information

Stone Soup Translation

Stone Soup Translation Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is

More information

Training and Evaluating Error Minimization Rules for Statistical Machine Translation

Training and Evaluating Error Minimization Rules for Statistical Machine Translation Training and Evaluating Error Minimization Rules for Statistical Machine Translation Ashish Venugopal School of Computer Science Carnegie Mellon University arv@andrew.cmu.edu Andreas Zollmann School of

More information

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses PBML logo The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY 2009 1 11 Improved Minimum Error Rate Training in Moses Nicola Bertoldi, Barry Haddow, Jean-Baptiste Fouet Abstract We describe

More information

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Discriminative Training with Perceptron Algorithm for POS Tagging Task Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu

More information

Power Mean Based Algorithm for Combining Multiple Alignment Tables

Power Mean Based Algorithm for Combining Multiple Alignment Tables Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination

More information

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented

More information

LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING

LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING Joshua Goodman Speech Technology Group Microsoft Research Redmond, Washington 98052, USA joshuago@microsoft.com http://research.microsoft.com/~joshuago

More information

On LM Heuristics for the Cube Growing Algorithm

On LM Heuristics for the Cube Growing Algorithm On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current

More information

Tuning Machine Translation Parameters with SPSA

Tuning Machine Translation Parameters with SPSA Tuning Machine Translation Parameters with SPSA Patrik Lambert, Rafael E. Banchs TALP Research Center, Jordi Girona Salgado 1 3. 0803 Barcelona, Spain lambert,rbanchs@gps.tsc.upc.edu Abstract Most of statistical

More information

Opinion Mining by Transformation-Based Domain Adaptation

Opinion Mining by Transformation-Based Domain Adaptation Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose

More information

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh

More information

Query Lattice for Translation Retrieval

Query Lattice for Translation Retrieval Query Lattice for Translation Retrieval Meiping Dong, Yong Cheng, Yang Liu, Jia Xu, Maosong Sun, Tatsuya Izuha, Jie Hao # State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory

More information

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614

More information

Predicting Sentence Translation Quality Using Extrinsic and Language Independent Features

Predicting Sentence Translation Quality Using Extrinsic and Language Independent Features Noname manuscript No. (will be inserted by the editor) Predicting Sentence Translation Quality Using Extrinsic and Language Independent Features Ergun Biçici Declan Groves Josef van Genabith Received:

More information

Reassessment of the Role of Phrase Extraction in PBSMT

Reassessment of the Role of Phrase Extraction in PBSMT Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language

More information

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska

More information

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh

More information

Improving the Performance of Text Categorization using N-gram Kernels

Improving the Performance of Text Categorization using N-gram Kernels Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY The Prague Bulletin of Mathematical Linguistics NUMBER 9 JANUARY 009 79 88 Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems Omar F. Zaidan Abstract

More information

An efficient and user-friendly tool for machine translation quality estimation

An efficient and user-friendly tool for machine translation quality estimation An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia Department of Computer Science, University of Sheffield, UK {kashif.shah,l.specia}@sheffield.ac.uk

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Architecture for Text Normalization using Statistical Machine Translation techniques

Architecture for Text Normalization using Statistical Machine Translation techniques Architecture for Text Normalization using Statistical Machine Translation techniques V. López-Ludeña, R. San-Segundo, J. M. Montero, R. Barra-Chicote, J. Lorenzo Speech Technology Group. E.T.S.I. Telecomunicación.

More information

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Robert C. Moore and Chris Quirk Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com, chrisq@microsoft.com

More information

Using Maximum Entropy for Automatic Image Annotation

Using Maximum Entropy for Automatic Image Annotation Using Maximum Entropy for Automatic Image Annotation Jiwoon Jeon and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst Amherst, MA-01003.

More information

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Using Document Summarization Techniques for Speech Data Subset Selection

Using Document Summarization Techniques for Speech Data Subset Selection Using Document Summarization Techniques for Speech Data Subset Selection Kai Wei, Yuzong Liu, Katrin Kirchhoff, Jeff Bilmes Department of Electrical Engineering University of Washington Seattle, WA 98195,

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

Automatic Summarization

Automatic Summarization Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization

More information

Log Linear Model for String Transformation Using Large Data Sets

Log Linear Model for String Transformation Using Large Data Sets Log Linear Model for String Transformation Using Large Data Sets Mr.G.Lenin 1, Ms.B.Vanitha 2, Mrs.C.K.Vijayalakshmi 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology,

More information

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Adaptive Binary Quantization for Fast Nearest Neighbor Search IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,

More information

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,

More information

PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM

PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM Scott Wu Montgomery Blair High School Silver Spring, Maryland Paul Kienzle Center for Neutron Research, National Institute of Standards and Technology

More information

Forest Reranking for Machine Translation with the Perceptron Algorithm

Forest Reranking for Machine Translation with the Perceptron Algorithm Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

Sparse Non-negative Matrix Language Modeling

Sparse Non-negative Matrix Language Modeling Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language

More information

Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;

Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions; Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the

More information

Unigrams Bigrams Trigrams

Unigrams Bigrams Trigrams IMPLEMENTATION OF WORD BASED STATISTICAL LANGUAGE MODELS Frank Wessel, Stefan Ortmanns and Hermann Ney Lehrstuhl fur Informatik VI, RWTH Aachen, Uniersity of Technology, D-5056 Aachen, Germany, Phone/Fax:

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

A Quantitative Analysis of Reordering Phenomena

A Quantitative Analysis of Reordering Phenomena A Quantitative Analysis of Reordering Phenomena Alexandra Birch Phil Blunsom Miles Osborne a.c.birch-mayne@sms.ed.ac.uk pblunsom@inf.ed.ac.uk miles@inf.ed.ac.uk University of Edinburgh 10 Crichton Street

More information

Discriminative Reranking for Machine Translation

Discriminative Reranking for Machine Translation Discriminative Reranking for Machine Translation Libin Shen Dept. of Comp. & Info. Science Univ. of Pennsylvania Philadelphia, PA 19104 libin@seas.upenn.edu Anoop Sarkar School of Comp. Science Simon Fraser

More information

Scalable Trigram Backoff Language Models

Scalable Trigram Backoff Language Models Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work

More information

Inclusion of large input corpora in Statistical Machine Translation

Inclusion of large input corpora in Statistical Machine Translation Inclusion of large input corpora in Statistical Machine Translation Bipin Suresh Stanford University bipins@stanford.edu ABSTRACT In recent years, the availability of large, parallel, bilingual corpora

More information

Lexicographic Semirings for Exact Automata Encoding of Sequence Models

Lexicographic Semirings for Exact Automata Encoding of Sequence Models Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the

More information

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Kristina Toutanova Microsoft Research Redmond, WA 98502 kristout@microsoft.com Byung-Gyu Ahn Johns Hopkins University

More information

Complex Prediction Problems

Complex Prediction Problems Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity

More information

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation. A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing

More information

BBN System Description for WMT10 System Combination Task

BBN System Description for WMT10 System Combination Task BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,

More information

Tracking of Human Body using Multiple Predictors

Tracking of Human Body using Multiple Predictors Tracking of Human Body using Multiple Predictors Rui M Jesus 1, Arnaldo J Abrantes 1, and Jorge S Marques 2 1 Instituto Superior de Engenharia de Lisboa, Postfach 351-218317001, Rua Conselheiro Emído Navarro,

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Joint Decoding with Multiple Translation Models

Joint Decoding with Multiple Translation Models Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Local Phrase Reordering Models for Statistical Machine Translation

Local Phrase Reordering Models for Statistical Machine Translation Local Phrase Reordering Models for Statistical Machine Translation Shankar Kumar, William Byrne Center for Language and Speech Processing, Johns Hopkins University, 3400 North Charles Street, Baltimore,

More information

Kernel Density Construction Using Orthogonal Forward Regression

Kernel Density Construction Using Orthogonal Forward Regression Kernel ensity Construction Using Orthogonal Forward Regression S. Chen, X. Hong and C.J. Harris School of Electronics and Computer Science University of Southampton, Southampton SO7 BJ, U.K. epartment

More information

Shrinking Exponential Language Models

Shrinking Exponential Language Models Shrinking Exponential Language Models Stanley F. Chen IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 stanchen@watson.ibm.com Abstract In (Chen, 2009), we show that for a variety

More information

Supervised classification of law area in the legal domain

Supervised classification of law area in the legal domain AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Computing Optimal Alignments for the IBM-3 Translation Model

Computing Optimal Alignments for the IBM-3 Translation Model 14th Conference on Computational Natural Language Learning, Uppsala, Sweden, 2010 Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University,

More information

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model TAE IN SEOL*, SUN-TAE CHUNG*, SUNHO KI**, SEONGWON CHO**, YUN-KWANG HONG*** *School of Electronic Engineering

More information

arxiv: v1 [cs.cv] 2 May 2016

arxiv: v1 [cs.cv] 2 May 2016 16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation

More information

BLANC: Learning Evaluation Metrics for MT

BLANC: Learning Evaluation Metrics for MT BLANC: Learning Evaluation Metrics for MT Lucian Vlad Lita and Monica Rogati and Alon Lavie Carnegie Mellon University {llita,mrogati,alavie}@cs.cmu.edu Abstract We introduce BLANC, a family of dynamic,

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints

More information

Database-Text Alignment via Structured Multilabel Classification

Database-Text Alignment via Structured Multilabel Classification Database-Text Alignment via Structured Multilabel Classification Benjamin Snyder and Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {bsnyder,regina}@csail.mit.edu

More information

Constrained Discriminative Training of N-gram Language Models

Constrained Discriminative Training of N-gram Language Models Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY Memory-Based Machine Translation and Language Modeling

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY Memory-Based Machine Translation and Language Modeling The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY 2009 17 26 Memory-Based Machine Translation and Language Modeling Antal van den Bosch, Peter Berck Abstract We describe a freely available

More information

WebMining: An unsupervised parallel corpora web retrieval system

WebMining: An unsupervised parallel corpora web retrieval system WebMining: An unsupervised parallel corpora web retrieval system Jesús Tomás Instituto Tecnológico de Informática Universidad Politécnica de Valencia jtomas@upv.es Jaime Lloret Dpto. de Comunicaciones

More information

On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding

On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding Colin Cherry National Research Council colin.cherry@nrc-cnrc.gc.ca Robert C. Moore Google bobmoore@google.com Chris Quirk Microsoft

More information

First Steps Towards Coverage-Based Document Alignment

First Steps Towards Coverage-Based Document Alignment First Steps Towards Coverage-Based Document Alignment Luís Gomes 1, Gabriel Pereira Lopes 1, 1 NOVA LINCS, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal ISTRION BOX, Translation

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

System Combination Using Joint, Binarised Feature Vectors

System Combination Using Joint, Binarised Feature Vectors System Combination Using Joint, Binarised Feature Vectors Christian F EDERMAN N 1 (1) DFKI GmbH, Language Technology Lab, Stuhlsatzenhausweg 3, D-6613 Saarbrücken, GERMANY cfedermann@dfki.de Abstract We

More information

Stabilizing Minimum Error Rate Training

Stabilizing Minimum Error Rate Training Stabilizing Minimum Error Rate Training George Foster and Roland Kuhn National Research Council Canada first.last@nrc.gc.ca Abstract The most commonly used method for training feature weights in statistical

More information

A New Combinatorial Design of Coded Distributed Computing

A New Combinatorial Design of Coded Distributed Computing A New Combinatorial Design of Coded Distributed Computing Nicholas Woolsey, Rong-Rong Chen, and Mingyue Ji Department of Electrical and Computer Engineering, University of Utah Salt Lake City, UT, USA

More information

COMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION

COMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION COMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION Andrew O. Hatch ;2, Andreas Stolcke ;3, and Barbara Peskin The International Computer Science Institute, Berkeley,

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Julia Kempe & David S. Rosenberg NYU CDS January 29, 2019 Julia Kempe & David S. Rosenberg (NYU CDS) DS-GA 1003 / CSCI-GA 2567 January 29, 2019 1 / 29 Decision

More information

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,

More information

Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation

Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Markus Dreyer, Keith Hall, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University 300

More information

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition

Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition 2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien

More information

LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS

LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson

More information

Noise filtering for television receivers with reduced memory

Noise filtering for television receivers with reduced memory Noise filtering for television receivers with reduced memory R. J. Schutten, G. de Haan and A. H. M. van Roermund. Philips Research Laboratories, Television Systems Group, Prof. Holstlaan 4, 5656 AA Eindhoven,

More information