WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer

Similar documents
Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Ping-pong decoding Combining forward and backward search

Chapter Seven: Regular Expressions

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Lecture 3 Regular Expressions and Automata

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Creating LRs with FSTs Part II Compiling automata and transducers

Tuesday, September 30, 14.

Context Free Languages and Pushdown Automata

Regular Expressions & Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Weighted Finite-State Transducers in Computational Biology

Speech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th

Finite-State Transducers in Language and Speech Processing

Slides for Faculty Oxford University Press All rights reserved.

Regular Expressions. Chapter 6

CMSC 132: Object-Oriented Programming II

Ambiguous Grammars and Compactification

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

CS 314 Principles of Programming Languages. Lecture 3

Multiple Choice Questions

CS 314 Principles of Programming Languages

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Theory of Programming Languages COMP360

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

CSE 431S Scanning. Washington University Spring 2013

Lecture 9. LVCSR Decoding (cont d) and Robustness. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Languages and Compilers

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Dr. D.M. Akbar Hussain

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Formal languages and computation models

CS402 - Theory of Automata Glossary By

Compiler Construction

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions

XFST2FSA. Comparing Two Finite-State Toolboxes

Chapter 4: Regular Expressions

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Languages and Finite Automata

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSE528 Natural Language Processing Venue:ADB-405 Topic: Regular Expressions & Automata. www. l ea rn ersd esk.weeb l y. com

CMSC 330: Organization of Programming Languages

Section 1.7 Sequences, Summations Cardinality of Infinite Sets

Automata Theory CS S-FR Final Review

Notes for Comp 454 Week 2

Lec-5-HW-1, TM basics

CS 441G Fall 2018 Exam 1 Matching: LETTER

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Regular Expression Derivatives in Python

Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1

Section 2.4 Sequences and Summations

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Lexical Analysis. Lecture 3-4

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Behaviour Diagrams UML

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

This example uses the or-operator ' '. Strings which are enclosed in angle brackets (like <N>, <sg>, and <pl> in the example) are multi-character symb

Languages and Strings. Chapter 2

Formal Languages and Compilers Lecture VI: Lexical Analysis

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

Turing Machine Languages

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Formal Languages and Automata

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Theory of Computations III. WMU CS-6800 Spring -2014

Week 2: Syntax Specification, Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

(Refer Slide Time: 0:19)

CSc 453 Lexical Analysis (Scanning)

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Non-deterministic Finite Automata (NFA)

Lecture 8. LVCSR Decoding. Bhuvana Ramabhadran, Michael Picheny, Stanley F. Chen

CMSC 330: Organization of Programming Languages

Chapter 3: Lexing and Parsing

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

1.3 Functions and Equivalence Relations 1.4 Languages

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

Automating Construction of Lexers

CS308 Compiler Principles Lexical Analyzer Li Jiang

Lecture 8. LVCSR Training and Decoding. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lexical Analysis. Implementation: Finite Automata

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CMSC 330: Organization of Programming Languages. Context Free Grammars

Regular Models of Phonological Rule Systems

Chapter 18: Decidability

CS6160 Theory of Computation Problem Set 2 Department of Computer Science, University of Virginia

Transcription:

+ WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer

+ FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling

+ The language of sheep: /baa+!/ 3 Final state State Transition n We can say the following things about this machine n It has 5 states n b, a, and! are in its alphabet n q is the start state n q 4 is an accept state n It has 5 transitions

+ Input tape acceptor b a! e 1 1 2 2 2,3 3 4 4 Reject Accept Brandeis CS114 213 Meteer

+ Non-Determinism Both define /baa+/ Input: b Brandeis CS114 213 Meteer a a Stay in q2 or go to q3?

+ Non-Determinism cont. n Yet another technique n Epsilon transitions n Key point: these transitions do not examine or advance the tape during recognition Brandeis CS114 213 Meteer

+ Equivalence n Non-deterministic machines can be converted to deterministic ones with a fairly simple construction n That means that they have the same power: n non-deterministic machines are not more powerful than deterministic ones in terms of the languages they can accept n Two basic approaches to ND recognition (used in all major implementations of regular expressions) 1. Either take a ND machine and convert it to a D machine and then do recognition with that. 2. Or explicitly manage the process of recognition as a statespace search (leaving the machine as is). Brandeis CS114 213 Meteer

+ Non-Deterministic Recognition: Search n In a ND FSA there exists at least one path through the machine for a string that is in the language defined by the machine. n But not all paths directed through the machine for an accept string lead to an accept state. n No paths through the machine lead to an accept state for a string not in the language. n Non-determinism doesn t get us more formal power and it causes headaches so why bother? n More natural (understandable) solutions Brandeis CS114 213 Meteer

+ Compositional Machines n Formal languages are just sets of strings 9 n Therefore, we can talk about various set operations (intersection, union, concatenation) n This turns out to be a useful exercise Union Concatenation Speech and Language Processing - Jurafsky and Martin 9/8/17

+ Weighted finite state acceptors 1 a/1 1 b/1 2/1 n Like a normal FSA but with costs on the arcs and final-states n Note: cost comes after /. For final-state, 2/1 means final-cost 1 on state 2. n View WFSA as a function from a string to a cost. n In this view, unweighted FSA is f : string {, }. n If multiple paths have the same string, take the one with the lowest cost. n This example maps ab to (3 = 1+1+1), all else to. Thanks for Mirko Hannemann for this slide

+ Weights vs. costs a/ b/2 1/ 11 2 c/1 3/1 n Use cost to refer to the numeric value, and weight when speaking abstractly, e.g.: n The acceptor above accepts a with unit weight. n It accepts a with zero cost. n It accepts bc with cost 4=2+1+1 n State 1 is final with unit weight. n The acceptor assigns zero weight to xyz. n It assigns infinite cost to xyz. Thanks for Mirko Hannemann for this slide

+ Combinations red/.5 green/.3 1 blue/ yellow/.6 n Union: parallel 2 /.8 6 eps/ eps/ red/.5 green/.4 blue/1.2 1 / 2 /.3 green/.3 1 blue/ 2 /.8 yellow/.6 3 green/.4 4 / blue/1.2 12 5 /.3 n Concatenation: series red/.5 green/.3 1 blue/ yellow/.6 2 eps/.8 3 green/.4 blue/1.2 4 / 5 /.3 n Kleene closure: arbitrary repetition 3 / eps/ green/.4 eps/ blue/1.2 1 / eps/.3 2 /.3

+ WSFAs in speech 13 Language modeling using/1 1 data/.66 intuition/.33 Pronunciation modeling d/1 1 ey/.5 ae/.5 2 2 3 (a) t/.3 dx/.7 is/.5 are/.5 is/1 3 4 ax/1 4 better/.7 worse/.3 5

+ WFS Transducer n Accept an input while producing an output c:a/.3 14 input output weight a:b/.1 1 a:a/.4 a:b/.3 b:a/.2 2 b:b/.5 3/.6 n Input phonemes: output words (a) d:data/1 1 ey:ε/.5 ae:ε/.5 2 t:ε/.3 dx:ε/.7 3 ax: ε /1 4 d:dew/1 5 uw:ε/1 6

+ FST for morphology: Foxes and Cats 15 9/12/17

+ Composition c:a/.3 a:b/.1 1 a:a/.4 a:b/.6 16 b:a/.2 2 b:b/.5 3/.6 b:c/.3 1 a:b/.4 2/.7 a c a à b a a à c b b c:b/.9 c:b/.7 (1, 2) a:b/1 a:c/.4 (, ) (1, 1) a:b/.8 (3, 2)/1.3 a c a à c b b

+ WFSTs in Action with OpenFST 17 n Create text file n Compile n Print n Show info n Union n Concatenate n Compose n Invert