On a Kernel Regression Approach to Machine Translation
|
|
- Brett McCarthy
- 5 years ago
- Views:
Transcription
1 On a Kernel Regression Approach to Machine Translation Nicolás Serrano, Jesús Andrés-Ferrer, and Francisco Casacuberta Instituto Tecnológico de Informática {nserrano,jandres,fcn}@iti.upv.es Abstract. We present a machine translation framework based on Kernel Regression techniques. The translation process is modeled as a stringto-string mapping. For doing so, first both source and target strings are mapped to a natural vector space obtaining feature vectors. Afterwards, a translation mapping is defined from the source feature vector to the target feature vector. This translation mapping is learnt by linear regression. Once the target feature vector is obtained, we use a multi-graph search to find all the possible target strings whose mappings correspond to the translated feature vector. We present experiments in a small but relevant task showing encouraging results. 1 Introduction Machine translation deals with the problem of mapping sentences x from a source language X to a target language Y. Due to the complexity of the translation problem, the relationships between these two languages cannot be properly enumerated as a set of rules. Consequently, several authors have approached this problem as a statistical pattern recognition problem [1]. However, we present a new method for machine translation based on the work in [2], where a regression framework for learning transductions is proposed. Similar methods have been applied to many fields such as text categorization [4] or machine translation [5]. In our approach, both source and target strings are mapped to natural feature vectors. Then a translation mapping is learnt between feature vector domains. Given a source string x, the translation process consists in mapping it to a feature vector u and then, mapping this vector u to its corresponding target feature vector v. The latter mapping, the so-called translation mapping is learnt by regression techniques. Finally, the pre-image set for the target feature vector v must be found. This problem is referred as the Pre-image problem. The focus of this paper is to solve this problem in a regression-based machine translation framework. Some previous works such as [5], have explore the idea of learning the translation as a regression problem. These works do not handle the pre-image problem directly but use the model as a score to the standard statistical machine translation systems. Specifically, they use a phrased-based statistical machine translation model [6] and use the kernel regression model as score to the phrased-based
2 2 N. Serrano, J. Andres-Ferrer and F. Casacuberta search. This approach does not make the best of the regression approach, as proposed in [2], losing some of its main advantages. On the contrary, the pre-image search proposed in [2] is adapted in this work to the peculiarities in the machine translation problem. This aim is achieved by building the DeBruijn Graph [7] for a target feature vector and then finding eulerian paths within it. Nevertheless, due to the high dimensionality of the feature vectors, severe problems arise. 2 The Training Process The aim of machine translation is to learn a mapping from a source language X to a target language Y, i.e. f : X Y, where X is the source vocabulary and Y is the target vocabulary. In statistical machine translation the optimal translation function f is designed based on Bayes decision theory [1] where p(y x) is approximated by f(x) = arg maxp(y x) (1) y Y exp(c(x, y)) p(y x) = y Y exp(c(x, y )) (2) since the exponential function is monotonically increasing, Eq. (1) simplifies to f(x) = argmaxc(x, y) (3) y Y where C(x, y) is a score function, which is modelled by a feature set {h k } K 1 of theoretically and heuristically motivated functions [8] C(x, y) = k λ k h k (x, y) (4) where λ k are the model parameters that weight the feature function relevance. In state-of-art translation systems [9], these feature functions are mainly modelled by logarithms of probability models, such as the phrase-based models [6]. However, we propose a method for learning the translation function f based on regression techniques. In order to configure the regression framework, the source strings x X are mapped to the natural domain u U N D1 via a source mapping function φ X, i.e. u = φ X (x). Similarly, the target strings y Y are mapped to another natural domain, v V N D2, via a target mapping function φ Y, i.e. φ Y (y). Although both source and target mappings are not required to be of the same type, we will henceforth assume so. Then, we define the mappings, φ X and φ Y, as the function, φ n, that generates a n- gram count vector from a given string, x and y respectively. More accurately, φ n (x) = { x u1,..., x un } with u i standing for the i-th n-gram in lexicografic
3 On a Kernel Regression Approach to Machine Translation 3 order of the vocabulary X, and x u is the number of occurrences of u in x. For instance, the string x = aaabb of the language X = {a, b} will correspond to the bigram mapping output u = φ 2 (x) = (2, 1, 0, 1). The feature vector mapping u U N D1 is useful when comparing strings, since it helps to define a kernel between strings as follows: K n (x, x ) = φ n (x)φ n (x ) T = u Σ n x u x u (5) where K n (x, x ) ranges from the number of common n-grams if both strings are equal, to zero if they totally differ. Once the string-to-vector mapping is defined, the training problem is restated as a regression problem where a source-to-target mapping must to be found, i.e. finding the translation mapping h : U V. Given such a mapping h and two sentences, a source sentence x and its translation y; we define a string to target feature mapping, g : X V, as follows g(x) = h(φ X (x)) = φ Y (y) (6) Given a source string x, the proposed translation method consists in mapping it to U via φ X (x) and then, mapping it to V with the translation function h(x). Afterwards, given the target feature vector v obtained from the translation mapping, we compute its pre-image set φ 1 Y (v). Figure 1 depicts a general scheme of the whole translation process. X φ X( ) U f( ) g( ) h( ) Y φ Y ( ) φ 1 Y ( ) V Fig. 1. Machine translation kernel regression scheme 2.1 The Linear Regression The function h maps between two natural domains h : N D1 N D2. Since discrete regression problems yield complicated training algorithms, we learn an extension of this function h that approximates the natural domains by real domains, i.e. h : R D1 R D2. We further assume that our regression problem is linear, that is to say, that the mapping function h(u), and hereby its extension h, can be aproximated with a linear function, i.e. h(u) = Wu. Note that in this case the string-to-feature vector mapping g, is simplified to g(x) = Wφ X (x). Given a set of sentences, (x m, y m ) M 1, we can learn the optimal Ŵ matrix using
4 4 N. Serrano, J. Andres-Ferrer and F. Casacuberta the regularised least square as follows Ŵ = arg min W M Wφ X (x m ) φ Y (y m ) 2 + γ W 2 F (7) m=1 where F refers to the Frobenius norm and where γ > 0 is the regularization term. The aim of the regularisation term is to avoid the weights matrix W to contain large weights, which is a clear evidence of overtraining. The solution to Eq. (7) is unique and is found by differentiating the expression and equaling it to zero Ŵ = M Y (K X + γi) 1 M T X (8) where M Y = [φ Y (y 1 ),..., φ Y (y M )] is the D 2 M matrix of which j-th column vector is φ Y (y j ), where M X = [φ X (x 1 ),..., φ X (x M )] is the analogous for the source strings, and where K X is the Gram matrix associated to the kernel K n applied to the source samples, i.e. [K X ] ij = K n (x i, x j ) for all i, j such that 1 i, j M. 3 The decoding Problem In the proposed framework, the regression output for a given source string x, is a feature vector v which represents the occurrence counts of each target n-gram in its respective target translation y. Any target string y, that produces the same count vector v by means of the target mapping φ Y (y) is regarded as a possible translation of the source sentence x. Therefore, the decoding problem or the pre-image problem is theoretically constrained to a subdomain of Y. The decoding or pre-image problem is stated as the problem of finding which target sentences are represented as given a feature vector, i.e., to find the set (g(x)). The pre-image of g(x) is not unique since each reordering of the same counts leads to the same target feature vector v. To further understand the problem we give a simple example. We assume the target language in the example is given by Y = {a, b} and that the mapping function is φ 2, i.e. counting the number of possible bigrams. This implies that the count vector has four dimensions, one for each of the possible bigrams {aa, ab, ba, bb}. Since the dimensions are sorted in lexicographic order, the first dimension v 1 represents the occurrences of the bigram aa, the second dimension v 2, the occurrences of ab; and so on. If the regression output for a source sentence x is v = g(x) = {2, 1, 1, 0}, φ 1 Y then its pre-image set is φ 1 Y (v) = {aaaba, aabaa, baaab}. When dealing with natural feature vectors the pre-image problem has a wellknown solution [7]. First, it is needed to build the so-called DeBruijn graph as follows: all the (n 1)-gram sequences represent a different node in the graph, and edges are added going from a node, a 1 a 2... a n 1, to a node, a 2... a n 1 a n, if they have a positive weight, which is the count of the n-gram a 1... a n in v. In
5 On a Kernel Regression Approach to Machine Translation 5 2 a 1 1 b Fig.2. A DeBruijn graph for bigrams and the feature vector v = {2, 1,1, 0}. this graph, each Eulerian path 1 corresponds to a target string in the pre-image set. The DeBruijn graph of the proposed example is shown Fig. 2. At this point several problems arise in practice: The translation regresion is real-valued h instead of natural-valued h. Eulerian paths are not correctly defined in this case, since the DeBruijn technique cannot be directly applied in real-valued feature vectors. There are unknown source and target n-grams, those not appearing at the training samples. This makes the target feature vector v not to define a unique connected DeBruijn graph, but one with multiple isolated connected components. Even if we obtain all the possible eulerian paths within the graph, that represents all the possible translation, there is no way to select the proper translation corresponding to the source string. 3.1 The Aliasing Problem The result obtained from the regression is a real-valued g function where the meaning of each of the target vector dimensions v k is not clear. For instance, we take the example in which the target vocabulary is given by Y = {a, b} and a bigram mapping is used φ 2 ( ). In this example, the obtained target feature vector for a given source string could be v = (1.2, 0.5, 0.8, 0.25) instead of the perfect regression v = (2, 1, 1, 0). For deeply understanding the implication of any approximation, we must analyse the effect of variations in the natural-valued feature vector. Assuming that the correct regresion is given by v = (2, 1, 1, 0) then the pre-image set is φ 1 (v) = {abaa, aaba, baaab}. In the case in which the regresion produces v = (2, 1, 1, 1), the pre-image set changes to φ 1 (v) = {abbaa, aabba, baaabb, bbaaab}. Therefore, adding (or subtracting) 1 to any count dimension incurs in one extra (or less) word. The search method originally proposed in [2] is to round the real vector and then, build a DeBruijn graph from it and search eulerian paths whittin the graph. However, the best solution takes into account the previously discussed regression errors. For this aim we define the Levensthein loan 2 as the real increment or 1 A path inside a graph visiting each edge the number of times indicated by its weight 2 Named after the Levensthein distance [10]
6 6 N. Serrano, J. Andres-Ferrer and F. Casacuberta decrement that must be added to a given real vector in order to convert it to the natural domain. The Levensthein loan can be understood as the average Levensthein error of the correct hypothesis with respect to unknown reference. In order to find the pre-image for real-valued feature vectors, we build a weighted graph in a similar way the DeBruijn graph is built. The edges represent the real count of each n-gram according to the real-valued feature vector, instead of actual natural counts. During the search, the Levensthein loan is used to add or substract any necessary amount to the eulerian path simulating, in this way, a natural vector. In summary, the search algorithm looks for a path that uses the more of the weights in the graph and ask for the lowest possible loan. 3.2 The Unknown n-grams Problem In practice, it is common to find unknown n-grams when mapping strings to the feature vector count domain. This problem is stressed as the n becomes larger. Usually, this leads to DeBruijn graphs with isolated connected components and without eulerian paths covering the full sentence. A way to amend the problem is to apply backing-off to lower order models during the search. That is to say, when searching for a path, the search is simultaneously performed in different l-gram graphs (1 l n). Higher values of l are more restrictive during the search and also capture more structural dependencies but more number of parameters must be estimated in the regression. When the search at the highest order graph cannot continue building an eulerian path, the search backs off one order level with a penalisation factor, similarly to backing-off smoothing in language modeling [11]. This process goes on recursively until the unigram level if needed. 3.3 Adding Scores to the Search Lenvensthein loan score is not enough to select just one target string from the real-valued feature vector. Recall that even in the theoretical situation this would not be possible since several sentences can be built from different reordering of the same n-gram counts. These sentences have the same loan and therefore, we have no way to discriminate among them. Obviously, this problem is solved by adding an additional score to the target string y and consequently to each path in the pre-image search. Although, several scores can be proposed such as the probability given by statistical translations models, we have adopted in this first work a language model probability, specifically the n-gram language model [11]. In summary, the score in Eq. (3) for a given pair (x, y) is defined as follows for the proposed kernel regression model ( N ) C(x, y) = λ i sc i (x, y) + λ lm p n (y) + λ l exp ( y α) (9) i=1 where sc i is the Levensthein loan for the target string y at the i-th level graph; p n (y) is the language probability model for the target sentence y, and the last
7 On a Kernel Regression Approach to Machine Translation 7 term represents a word-bonus score to counteract the language model bias towards short sentences. 4 Experimental Results We have carried out experiments on the categorized EuTrans corpus [12]. This corpus comprises tourist requests at hotel front desks. Categorized EuTrans consists in pair of sentences divided into three sets: 9000 sentences to train, 1000 sentences were left out for validation and 3000 sentences for testing. There are 420 source and 230 target categories. The corpora perplexity is 3.95 and 4.73 for the target and source language respectively. Three kernel n-gram models where trained: one built from bigrams consisting on 2145 source bigrams and 929 target bigrams, another built from trigrams consisting on 5048 source trigrams and 2102 target trigrams; and the concatenation of both as described in 3.2. We trained increasing sizes of n-gram language models from 2 to 5 estimated by the SRILM toolkit [13]. The score weights, λ, in Eq. (9) were adjusted by the Downhill Simplex algorithm [14] for optimizing the BLEU score [15] on the validation set. The proposed system was compared with the Moses [9] baseline system, in which we have limited the phrase length to 7 allowing reordering and optimizing the parameters on a validation set. The Table 1. Results on categorized EuTrans. N stands for the order of the kernel whilst the LM order row stands for the order of the n-gram language model (0 no LM). Model N = 2 N = 3 N = 3 + (N = 2) Backoff LM order BLEU Moses baseline system scored 92.3 points of BLEU compared to the 95.5 points of our best system. A practical behaviour analysis of the proposed method is enclosed in the table 1. The results on this simple task are encouraging since almost all results surpass the baseline system. It can be observed that adding language information considerably improves the bigram system. However the trigram system is not benefited from the language model information, probably because of the corpus simplicity. Finally as expected, the search smoothing with back-off improves the results. 5 Conclusions and Future Work Kernel regression models are a new and encouraging approach to the machine translation field. In this work we have proposed a complete regression framework to machine translation by proposing a regression-based search. In contrast, other works perform the search by means of a phrase-based decoding. In addition, we
8 8 N. Serrano, J. Andres-Ferrer and F. Casacuberta have explored the idea of adding language information to rank the target strings among all the pre-image set. In a future work, different models apart from the n-gram language model will be added to rank the target strings, such as IBM word models and phrased-based models. Nevertheless several problems arise when complex corpus are used. To deal with them, further optimizations such as perform the model estimation through Cholesky incomplete decomposition or subset selection techniques are left for further research. Other possible improvements in the process would be using different kernel functions for comparing strings and use quadratic regression instead of linear regression as the regression model. Acknowledgments This work has been supported by the Spanish MEC under FPU scholarship AP , the valencian Conselleria d Empresa i Cincia grant CTBPRA/2005/004, grant Consolider Ingenio 2010 CSD by the EC (FEDER) and the Spanish MEC under grant TIN CO2-01. References 1. P.F. Brown et al. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistic. 19(2). pp (1993) 2. C. Cortes, M. Mohri, and J. Weston. A general regression technique for learning transductions. Proc. of the 22nd international conference on Machine learning, C. Cortes, M. Mohri, and J. Weston. A General Regression Framework for Learning String-to-String Mappings. In Predicting Structured Data. The MIT Press, N. Cancedda, E. Gaussier, C. Goutte, J. Render: Word-Sequence Kernels. Journal of Machine Learning Research 3. pp (2003) 5. Z. Wang and J. Shawe-Taylor: Kernel Regression Based Machine Translation. NAACL HLT 2007, Companion Volume, pp P. Koehn, F.J. Och and D. Marcu. Statistical phrase based translation. In Proceedings of HLT/NACL. (2003) 7. J.L. Gross and J. Yellen. Handbook of Graph Theory. Published by CRC Press. Pp ISBN: Franz Josef Och, Hermann Ney. The alignment template approach to statistical machine translation. Computational Linguistics, vol. 30(4), pp P. Koehn. H. Hoang. A. Birch and C. Callison-Burch. Moses: Open Source Toolkit for Statistical Machine Translation. Proc. of ACL 07 pp V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, pp (1966) 11. J.T. Goodman.An empirical study of smoothing techniques for language modelling. In Proc. of ACL 96, pp F. Casacuberta et al. Some approaches to statistical and finite-state speech-tospeech tranlation. Computer Speech and Language 18. pp (2004) 13. SRILM - An Extensible Language Modeling Toolkit, in Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September J. A. Nelder and R. Mead A Simplex Method for Function Minimization The Computer Journal Vol 7, pp (1965) 15. Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation in ACL-2002: pp
RegMT System for Machine Translation, System Combination, and Evaluation
RegMT System for Machine Translation, System Combination, and Evaluation Ergun Biçici Koç University 34450 Sariyer, Istanbul, Turkey ebicici@ku.edu.tr Deniz Yuret Koç University 34450 Sariyer, Istanbul,
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationTHE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM
THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based
More informationStatistical Machine Translation Part IV Log-Linear Models
Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have
More informationWord Graphs for Statistical Machine Translation
Word Graphs for Statistical Machine Translation Richard Zens and Hermann Ney Chair of Computer Science VI RWTH Aachen University {zens,ney}@cs.rwth-aachen.de Abstract Word graphs have various applications
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationStone Soup Translation
Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is
More informationTraining and Evaluating Error Minimization Rules for Statistical Machine Translation
Training and Evaluating Error Minimization Rules for Statistical Machine Translation Ashish Venugopal School of Computer Science Carnegie Mellon University arv@andrew.cmu.edu Andreas Zollmann School of
More informationPBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses
PBML logo The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY 2009 1 11 Improved Minimum Error Rate Training in Moses Nicola Bertoldi, Barry Haddow, Jean-Baptiste Fouet Abstract We describe
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationPower Mean Based Algorithm for Combining Multiple Alignment Tables
Power Mean Based Algorithm for Combining Multiple Alignment Tables Sameer Maskey, Steven J. Rennie, Bowen Zhou IBM T.J. Watson Research Center {smaskey, sjrennie, zhou}@us.ibm.com Abstract Alignment combination
More informationDiscriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented
More informationLANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING
LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING Joshua Goodman Speech Technology Group Microsoft Research Redmond, Washington 98052, USA joshuago@microsoft.com http://research.microsoft.com/~joshuago
More informationOn LM Heuristics for the Cube Growing Algorithm
On LM Heuristics for the Cube Growing Algorithm David Vilar and Hermann Ney Lehrstuhl für Informatik 6 RWTH Aachen University 52056 Aachen, Germany {vilar,ney}@informatik.rwth-aachen.de Abstract Current
More informationTuning Machine Translation Parameters with SPSA
Tuning Machine Translation Parameters with SPSA Patrik Lambert, Rafael E. Banchs TALP Research Center, Jordi Girona Salgado 1 3. 0803 Barcelona, Spain lambert,rbanchs@gps.tsc.upc.edu Abstract Most of statistical
More informationOpinion Mining by Transformation-Based Domain Adaptation
Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose
More informationStatistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora Chris Callison-Burch David Talbot Miles Osborne School on nformatics University of Edinburgh 2 Buccleuch Place Edinburgh
More informationQuery Lattice for Translation Retrieval
Query Lattice for Translation Retrieval Meiping Dong, Yong Cheng, Yang Liu, Jia Xu, Maosong Sun, Tatsuya Izuha, Jie Hao # State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory
More informationA Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions
A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions Marcin Junczys-Dowmunt Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska 87, 61-614
More informationPredicting Sentence Translation Quality Using Extrinsic and Language Independent Features
Noname manuscript No. (will be inserted by the editor) Predicting Sentence Translation Quality Using Extrinsic and Language Independent Features Ergun Biçici Declan Groves Josef van Genabith Received:
More informationReassessment of the Role of Phrase Extraction in PBSMT
Reassessment of the Role of Phrase Extraction in PBSMT Francisco Guzman Centro de Sistemas Inteligentes Tecnológico de Monterrey Monterrey, N.L., Mexico guzmanhe@gmail.com Qin Gao and Stephan Vogel Language
More informationSyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation
SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation Marcin Junczys-Dowmunt, Arkadiusz Sza l Faculty of Mathematics and Computer Science Adam Mickiewicz University ul. Umultowska
More informationA Semi-supervised Word Alignment Algorithm with Partial Manual Alignments
A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments Qin Gao, Nguyen Bach and Stephan Vogel Language Technologies Institute Carnegie Mellon University 000 Forbes Avenue, Pittsburgh
More informationImproving the Performance of Text Categorization using N-gram Kernels
Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY
The Prague Bulletin of Mathematical Linguistics NUMBER 9 JANUARY 009 79 88 Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems Omar F. Zaidan Abstract
More informationAn efficient and user-friendly tool for machine translation quality estimation
An efficient and user-friendly tool for machine translation quality estimation Kashif Shah, Marco Turchi, Lucia Specia Department of Computer Science, University of Sheffield, UK {kashif.shah,l.specia}@sheffield.ac.uk
More informationEncoding Words into String Vectors for Word Categorization
Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationString Vector based KNN for Text Categorization
458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research
More informationArchitecture for Text Normalization using Statistical Machine Translation techniques
Architecture for Text Normalization using Statistical Machine Translation techniques V. López-Ludeña, R. San-Segundo, J. M. Montero, R. Barra-Chicote, J. Lorenzo Speech Technology Group. E.T.S.I. Telecomunicación.
More informationRandom Restarts in Minimum Error Rate Training for Statistical Machine Translation
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation Robert C. Moore and Chris Quirk Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com, chrisq@microsoft.com
More informationUsing Maximum Entropy for Automatic Image Annotation
Using Maximum Entropy for Automatic Image Annotation Jiwoon Jeon and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst Amherst, MA-01003.
More informationOutline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder
GIZA++ and Moses Outline GIZA++ Steps Output files Moses Training pipeline Decoder Demo GIZA++ A statistical machine translation toolkit used to train IBM Models 1-5 (moses only uses output of IBM Model-1)
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationUsing Document Summarization Techniques for Speech Data Subset Selection
Using Document Summarization Techniques for Speech Data Subset Selection Kai Wei, Yuzong Liu, Katrin Kirchhoff, Jeff Bilmes Department of Electrical Engineering University of Washington Seattle, WA 98195,
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationLog Linear Model for String Transformation Using Large Data Sets
Log Linear Model for String Transformation Using Large Data Sets Mr.G.Lenin 1, Ms.B.Vanitha 2, Mrs.C.K.Vijayalakshmi 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology,
More informationAdaptive Binary Quantization for Fast Nearest Neighbor Search
IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,
More informationGender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV
Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,
More informationPARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM
PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM Scott Wu Montgomery Blair High School Silver Spring, Maryland Paul Kienzle Center for Neutron Research, National Institute of Standards and Technology
More informationForest Reranking for Machine Translation with the Perceptron Algorithm
Forest Reranking for Machine Translation with the Perceptron Algorithm Zhifei Li and Sanjeev Khudanpur Center for Language and Speech Processing and Department of Computer Science Johns Hopkins University,
More informationKarami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.
Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review
More informationSparse Non-negative Matrix Language Modeling
Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language
More informationReport for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;
Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the
More informationUnigrams Bigrams Trigrams
IMPLEMENTATION OF WORD BASED STATISTICAL LANGUAGE MODELS Frank Wessel, Stefan Ortmanns and Hermann Ney Lehrstuhl fur Informatik VI, RWTH Aachen, Uniersity of Technology, D-5056 Aachen, Germany, Phone/Fax:
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationRecent Developments in Model-based Derivative-free Optimization
Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationA Quantitative Analysis of Reordering Phenomena
A Quantitative Analysis of Reordering Phenomena Alexandra Birch Phil Blunsom Miles Osborne a.c.birch-mayne@sms.ed.ac.uk pblunsom@inf.ed.ac.uk miles@inf.ed.ac.uk University of Edinburgh 10 Crichton Street
More informationDiscriminative Reranking for Machine Translation
Discriminative Reranking for Machine Translation Libin Shen Dept. of Comp. & Info. Science Univ. of Pennsylvania Philadelphia, PA 19104 libin@seas.upenn.edu Anoop Sarkar School of Comp. Science Simon Fraser
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More informationInclusion of large input corpora in Statistical Machine Translation
Inclusion of large input corpora in Statistical Machine Translation Bipin Suresh Stanford University bipins@stanford.edu ABSTRACT In recent years, the availability of large, parallel, bilingual corpora
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationLearning Non-linear Features for Machine Translation Using Gradient Boosting Machines
Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines Kristina Toutanova Microsoft Research Redmond, WA 98502 kristout@microsoft.com Byung-Gyu Ahn Johns Hopkins University
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationBBN System Description for WMT10 System Combination Task
BBN System Description for WMT10 System Combination Task Antti-Veikko I. Rosti and Bing Zhang and Spyros Matsoukas and Richard Schwartz Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 018,
More informationTracking of Human Body using Multiple Predictors
Tracking of Human Body using Multiple Predictors Rui M Jesus 1, Arnaldo J Abrantes 1, and Jorge S Marques 2 1 Instituto Superior de Engenharia de Lisboa, Postfach 351-218317001, Rua Conselheiro Emído Navarro,
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationJoint Decoding with Multiple Translation Models
Joint Decoding with Multiple Translation Models Yang Liu and Haitao Mi and Yang Feng and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationLocal Phrase Reordering Models for Statistical Machine Translation
Local Phrase Reordering Models for Statistical Machine Translation Shankar Kumar, William Byrne Center for Language and Speech Processing, Johns Hopkins University, 3400 North Charles Street, Baltimore,
More informationKernel Density Construction Using Orthogonal Forward Regression
Kernel ensity Construction Using Orthogonal Forward Regression S. Chen, X. Hong and C.J. Harris School of Electronics and Computer Science University of Southampton, Southampton SO7 BJ, U.K. epartment
More informationShrinking Exponential Language Models
Shrinking Exponential Language Models Stanley F. Chen IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 stanchen@watson.ibm.com Abstract In (Chen, 2009), we show that for a variety
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationComputing Optimal Alignments for the IBM-3 Translation Model
14th Conference on Computational Natural Language Learning, Uppsala, Sweden, 2010 Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University,
More informationIllumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model
Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model TAE IN SEOL*, SUN-TAE CHUNG*, SUNHO KI**, SEONGWON CHO**, YUN-KWANG HONG*** *School of Electronic Engineering
More informationarxiv: v1 [cs.cv] 2 May 2016
16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationSemantic Word Embedding Neural Network Language Models for Automatic Speech Recognition
Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation
More informationBLANC: Learning Evaluation Metrics for MT
BLANC: Learning Evaluation Metrics for MT Lucian Vlad Lita and Monica Rogati and Alon Lavie Carnegie Mellon University {llita,mrogati,alavie}@cs.cmu.edu Abstract We introduce BLANC, a family of dynamic,
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints
More informationDatabase-Text Alignment via Structured Multilabel Classification
Database-Text Alignment via Structured Multilabel Classification Benjamin Snyder and Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {bsnyder,regina}@csail.mit.edu
More informationConstrained Discriminative Training of N-gram Language Models
Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY Memory-Based Machine Translation and Language Modeling
The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY 2009 17 26 Memory-Based Machine Translation and Language Modeling Antal van den Bosch, Peter Berck Abstract We describe a freely available
More informationWebMining: An unsupervised parallel corpora web retrieval system
WebMining: An unsupervised parallel corpora web retrieval system Jesús Tomás Instituto Tecnológico de Informática Universidad Politécnica de Valencia jtomas@upv.es Jaime Lloret Dpto. de Comunicaciones
More informationOn Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding
On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding Colin Cherry National Research Council colin.cherry@nrc-cnrc.gc.ca Robert C. Moore Google bobmoore@google.com Chris Quirk Microsoft
More informationFirst Steps Towards Coverage-Based Document Alignment
First Steps Towards Coverage-Based Document Alignment Luís Gomes 1, Gabriel Pereira Lopes 1, 1 NOVA LINCS, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal ISTRION BOX, Translation
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationSystem Combination Using Joint, Binarised Feature Vectors
System Combination Using Joint, Binarised Feature Vectors Christian F EDERMAN N 1 (1) DFKI GmbH, Language Technology Lab, Stuhlsatzenhausweg 3, D-6613 Saarbrücken, GERMANY cfedermann@dfki.de Abstract We
More informationStabilizing Minimum Error Rate Training
Stabilizing Minimum Error Rate Training George Foster and Roland Kuhn National Research Council Canada first.last@nrc.gc.ca Abstract The most commonly used method for training feature weights in statistical
More informationA New Combinatorial Design of Coded Distributed Computing
A New Combinatorial Design of Coded Distributed Computing Nicholas Woolsey, Rong-Rong Chen, and Mingyue Ji Department of Electrical and Computer Engineering, University of Utah Salt Lake City, UT, USA
More informationCOMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION
COMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION Andrew O. Hatch ;2, Andreas Stolcke ;3, and Barbara Peskin The International Computer Science Institute, Berkeley,
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Julia Kempe & David S. Rosenberg NYU CDS January 29, 2019 Julia Kempe & David S. Rosenberg (NYU CDS) DS-GA 1003 / CSCI-GA 2567 January 29, 2019 1 / 29 Decision
More informationOn the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes
On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,
More informationComparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation
Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation Markus Dreyer, Keith Hall, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University 300
More informationExplicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition
2009 0th International Conference on Document Analysis and Recognition Explicit fuzzy modeling of and positioning for handwritten Chinese character recognition Adrien Delaye - Eric Anquetil - Sébastien
More informationLOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS
LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson
More informationNoise filtering for television receivers with reduced memory
Noise filtering for television receivers with reduced memory R. J. Schutten, G. de Haan and A. H. M. van Roermund. Philips Research Laboratories, Television Systems Group, Prof. Holstlaan 4, 5656 AA Eindhoven,
More information