INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

Similar documents
Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

Discriminative Training for Phrase-Based Machine Translation

Statistical Machine Translation Part IV Log-Linear Models

Sparse Feature Learning

Outline GIZA++ Moses. Demo. Steps Output files. Training pipeline Decoder

Experimenting in MT: Moses Toolkit and Eman

NLP in practice, an example: Semantic Role Labeling

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Marcello Federico MMT Srl / FBK Trento, Italy

Homework 1. Leaderboard. Read through, submit the default output. Time for questions on Tuesday

A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

Corpus Acquisition from the Internet

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Machine Translation Zoo

Discriminative Training with Perceptron Algorithm for POS Tagging Task

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

EXPERIMENTAL SETUP: MOSES

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

Better translations with user collaboration - Integrated MT at Microsoft

Machine Learning for Deep-syntactic MT

The QCN System for Egyptian Arabic to English Machine Translation. Presenter: Hassan Sajjad

A General-Purpose Rule Extractor for SCFG-Based Machine Translation

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.

CMU-UKA Syntax Augmented Machine Translation

Discriminative Reranking for Machine Translation

TectoMT: Modular NLP Framework

Treex: Modular NLP Framework

THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM

RLAT Rapid Language Adaptation Toolkit

Parsing tree matching based question answering

Machine Translation and Discriminative Models

Local Phrase Reordering Models for Statistical Machine Translation

TALP: Xgram-based Spoken Language Translation System Adrià de Gispert José B. Mariño

Arabic Preprocessing Schemes for Statistical Machine Translation *

Power Mean Based Algorithm for Combining Multiple Alignment Tables

Information Retrieval. (M&S Ch 15)

Topics for Today. The Last (i.e. Final) Class. Weakly Supervised Approaches. Weakly supervised learning algorithms (for NP coreference resolution)

Improving the quality of a customized SMT system using shared training data. August 28, 2009

K-best Parsing Algorithms

Workshop on statistical machine translation for curious translators. Víctor M. Sánchez-Cartagena Prompsit Language Engineering, S.L.

An Unsupervised Model for Joint Phrase Alignment and Extraction

Parts of Speech, Named Entity Recognizer

More on indexing CE-324: Modern Information Retrieval Sharif University of Technology

1 Implement EM training of IBM model 1

Unsupervised Keyword Extraction from Single Document. Swagata Duari Aditya Gupta Vasudha Bhatnagar

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

Final Project Discussion. Adam Meyers Montclair State University

Homework 2: HMM, Viterbi, CRF/Perceptron

Learning to find transliteration on the Web

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Random Restarts in Minimum Error Rate Training for Statistical Machine Translation

Information Retrieval and Organisation

Classification. 1 o Semestre 2007/2008

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

IBM Model 1 and Machine Translation

Moses version 2.1 Release Notes

Digital Libraries: Language Technologies

Start downloading our playground for SMT: wget

The Prague Bulletin of Mathematical Linguistics NUMBER 91 JANUARY

Incremental Integer Linear Programming for Non-projective Dependency Parsing

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? FEBRUARY Improved Minimum Error Rate Training in Moses

MOSES. Statistical Machine Translation System. User Manual and Code Guide. Philipp Koehn University of Edinburgh

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Exam Marco Kuhlmann. This exam consists of three parts:

Thot Toolkit for Statistical Machine Translation. User Manual. Daniel Ortiz Martínez Webinterpret

The Prague Bulletin of Mathematical Linguistics NUMBER 94 SEPTEMBER An Experimental Management System. Philipp Koehn

MMT Modern Machine Translation. First Evaluation Plan

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments

A Bambara Tonalization System for Word Sense Disambiguation Using Differential Coding, Segmentation and Edit Operation Filtering

An efficient and user-friendly tool for machine translation quality estimation

Dynamic Feature Selection for Dependency Parsing

Computer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM),

Modeling Sequence Data

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Recap of the previous lecture. This lecture. A naïve dictionary. Introduction to Information Retrieval. Dictionary data structures Tolerant retrieval

Reassessment of the Role of Phrase Extraction in PBSMT

Improved Models of Distortion Cost for Statistical Machine Translation

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Training a Super Model Look-Alike:Featuring Edit Distance, N-Gram Occurrence,and One Reference Translation p.1/40

Large Scale Parallel Document Mining for Machine Translation

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

slide courtesy of D. Yarowsky Splitting Words a.k.a. Word Sense Disambiguation Intro to NLP - J. Eisner 1

Finding parallel texts on the web using cross-language information retrieval

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

Assignment 4 CSE 517: Natural Language Processing

Stabilizing Minimum Error Rate Training

Language Support, Linguistics, and Text Analytics in Solr

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Reassessment of the Role of Phrase Extraction in PBSMT

CS 6320 Natural Language Processing

Question of the Day. Machine Translation. Statistical Word Alignment. Centauri/Arcturan (Knight, 1997) Centauri/Arcturan (Knight, 1997)

Question Answering Systems

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Orange3-Textable Documentation

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Machine Learning in GATE

Transcription:

1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no

Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues

STMT so far 3 We have seen all the main elements of a phrasebased STMT-system: The model How to train it Decoding For actually building a system, we will use some tools implementing this model, e.g. Moses Remains some details at both ends Start: Preprocessing End: Tuning

Preprocessing 4 (Exercise 4.1) Which preparations must be done to a training corpus to make it suitable for training an SMT system? Clean up text Character encoding Mark-up (XML, html) Tokenization Why?

Preprocessing: casing 5 Case folding (downcasing) or not? The city is large. the city is large. Washington is large. washington is large. Why not? Mary met Smith and Browne mary met smith and browne Adding ambiguity Translating proper names wrongly

Alternative: true casing 6 What The city is large. the city is large. Washington is large. Washington is large. How? Read through corpus, count occurrences which are not sentence initial model Use model on corpus Remaining problem: beginning of sentence Browne met Smith? Green men entered the room Cannot always do right. But true casing is best alternative.

Sentence align the source and target 7 Assume two parallel texts which are split into sentences E: e1, e2,, en F: f1, f2,, fm Not necessarily equally many Task: organize E and F into equally many segments E: E1, E2,, Ek F: F1, F2,, Fk Where Ei corresponds to Fi Ei consists of some consecutive sentences: e j, e j+1,, e j+s Similarly for Fi: f k, fk +1,, f k+u Ei and Fi are minimal: There is no sub segment Ei of Ei and Fi of Fi such that Ei and Fi correspond and Ei =/=Ei or Fi =/=Fi

Sentence alignment continued 8 Sentence pairs: x many sentences in Ei, y many sentences in Fi Then call Ei-Fi an x-y pair 1-1pair: Good for training 1-0 pair or 0 1 pair: Ignore for training 1 2 or 2 1 pairs. Might be considered for training. How to identify sentences? Sentence length Word pairs from dictionary

Today 9 Preparing bitext Parameter tuning Reranking Some linguistic issues

The generative SMT-model 10 Adding weights: Koehn, lecture 5, Slide 17-21

How to tune weights? 11 1. Make an original system, S0, using a parallel corpus, C1, for the phrase table. 2. Use a distinct small parallel corpus, C2. (dev set) 3. Produce several S0-translations for each f-sentence in C2. n-best list (n=100, 1000, 10000) 4. Use a method for scoring the candidate translations in C2. (typically modified BLEU-score). 5. Try to adjust the weights to bring the best candidates in (4) towards top of list. 6. Make new system with adjusted weights. 7. Repeat from 3 towards convergence.

12

13

How to? (sec. 9.3) 14 5. Try to adjust the weights to bring the best candidates in (4) towards top of list. No analytic solution We can t differentiate a function and find zero values Take 1: try systematically, say λ LM =.1,.2,.3,,.9 λ φ =.1,.2,,.9- λ LM = λ_lm λ D = =.1,.2,, 1- (λ LM + λ φ ) Too many values to try out Small changes in λs, large effect on result: The steps are too large

Take 2: Powell search 15 Optimize one λ, say λ LM, keeping the other fixed. With this value for λ LM, optimize the next λ, etc. A method for searching for the best value for each λ

Take 3: 16 (alternative) Simplex algorithm Variants of hill climbing Read sec 9.3 Not the details of Finding threshold points Combining threshold points in sec 9.3.2 Not 9.3.3 Simplex

17 Will the solutions be global?

Today 18 Preparing bitext Parameter tuning Reranking Some linguistic issues

Reranking model for SMT 19 Translation in two steps 1. Use a STMT-system as we have seen so far Output: A set of translation candidates 2. Use the reranker to rank the outputs and select between them Discriminative model

Statistical models 20 Generative model Discriminative model Construct solutions and assign them probabilities Examples PCFG: Assign trees Probabilities to the trees HMM-tagger Produce tag-sequences w/probabilities The translation models, both IBM and phrasebased Starts with a set of solutions Select between them on the basis of a statistical score Example: Malt parser

The reranker 21 Consider it as a classification problem Supervised learning Training material: A set of sentences in source language One or more reference translations of these Output of the STMT-system for these sentences Choose learning goal: (A way to evaluate the output sentences) Typically modified BLEU (or NIST) score

The reranker - learning 22 Choose features Choose a learning strategy: Naïve Bayes Maximum entropy (INF5830) Skip here 9.2.4 Etc. The result is a ranker which - if we have succeeded -will return a reordered list where the top element has a better score than the top element before reranking Observe: This is machine learning The resulting reranker will not always improve the results

Reranking vs Tuning 23 What is the difference between Tuning and Reranking?

Reranking vs Tuning 24 Tuning is part of the training of the original STMTsystem. Tuning is applied to make an optimal translation system Reranking is part of a full MT-pipeline.. It is part of the decoding. It is applied to each sentence after the beam decoder has made an n-best list.

Today 25 Preparing bitext Parameter tuning Reranking Some linguistic issues

SMT + Linguistic Information Transliteration Compounds Names Morphology Word order and syntax

Translating numbers 27 How should a STMT system translate 12356? Most likely: It hasn t seen the number before. But since it translates into itself: one possible solution: Remove the number before translation, replace it by some dummy (say NNNN) Insert the number after translation (A default of not translating unseen tokens would give the same result)

Transliteration Norwegian, German 12,3 12 500 or 12.500 English 12.3 12 500 or 12,500 But the numbers are not exactly the same even in closely related languages Solution: Specific modules Translate specific phenomena Taken out of the regular SMT

Transliteration - names How should names in Japanese or Russian be spelled in English? The book describes recipes for doing this statistically(sec. 10.1) We will not consider this

Compounds German, Norwegian samhandlingsreform snøskredfare English word segmentation cruising speed Phrase-based SMT better than word-based New compounds/sparse data still a challenge Compound splitting in source language may help But how to put Humpty Dumpty together again? (when going to German or Norwegian)

Compounds in LOGON A set of possible templates: N 1 N 2 Ad 1 N 2 N 1 N 2 N 2 of N 1 etc. Generate all possible transfer rules from basic transfer rules for N 1 and N 2 from dictionary (order of n*n) Filter against monolingual target corpus Possible improvement: Turn into a probabilistic model

Translating names Which names should be translated and which not? How? Oslofjorden Rondane Statens lånekasse for utdanning Sognefjellesveien Sognefjellsveien the Sognefjellsveien Sognefjell Road the Sognefjell Road Sogn Mountain Road the Sogn Mountain Road the Parish Mountain Road

Translating names 33 Difficult to specify a recipe that fits all.