Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Similar documents
XFST2FSA. Comparing Two Finite-State Toolboxes

MITOCW watch?v=4dj1oguwtem

Creating LRs with FSTs Part II Compiling automata and transducers

The Replace Operator. Lauri Karttunen Rank Xerox Research Centre 6, chemin de Maupertuis F Meylan, France lauri, fr

(Refer Slide Time: 0:19)

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Lecture 4: Syntax Specification

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

CSE 105 THEORY OF COMPUTATION

Formal languages and computation models

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer

MITOCW watch?v=kz7jjltq9r4

Turing Machine Languages

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

MITOCW ocw f99-lec07_300k

1.0 Languages, Expressions, Automata

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CS3102 Theory of Computation Problem Set 2, Spring 2011 Department of Computer Science, University of Virginia

III. Supplementary Materials

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

2.2 Syntax Definition

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Dr. D.M. Akbar Hussain

Recursively Enumerable Languages, Turing Machines, and Decidability

We can create PDAs with multiple stacks. At each step we look at the current state, the current input symbol, and the top of each stack.

Problem One: A Quick Algebra Review

1 Parsing (25 pts, 5 each)

In our first lecture on sets and set theory, we introduced a bunch of new symbols and terminology.

Ambiguous Grammars and Compactification

14.1 Encoding for different models of computation

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Regular Languages and Regular Expressions

Context Free Languages and Pushdown Automata

You will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation

Non-deterministic Finite Automata (NFA)

Formal Languages. Formal Languages

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Lec-5-HW-1, TM basics

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

1 ICS 161: Design and Analysis of Algorithms Lecture notes for January 23, Bucket Sorting

Chapter Seven: Regular Expressions

Context Free Languages

Parsing Combinators: Introduction & Tutorial

Finite-State Transducers in Language and Speech Processing

Slides for Faculty Oxford University Press All rights reserved.

COP 3402 Systems Software Syntax Analysis (Parser)

Binary, Hexadecimal and Octal number system

8 ε. Figure 1: An NFA-ǫ

Formal Methods of Software Design, Eric Hehner, segment 1 page 1 out of 5

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Theory of Programming Languages COMP360

Combinatorics Prof. Dr. L. Sunil Chandran Department of Computer Science and Automation Indian Institute of Science, Bangalore

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

Programming and Data Structure

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

The LC3's micro-coded controller ("useq") is nothing more than a finite-state machine (FSM). It has these inputs:

Instructor: Craig Duckett. Lecture 03: Tuesday, April 3, 2018 SQL Sorting, Aggregates and Joining Tables

Automating Construction of Lexers

It Might Be Valid, But It's Still Wrong Paul Maskens and Andy Kramek

Theory Bridge Exam Example Questions Version of June 6, 2008

Programming, Data Structures and Algorithms Prof. Hema Murthy Department of Computer Science and Engineering Indian Institute of Technology, Madras

CS6160 Theory of Computation Problem Set 2 Department of Computer Science, University of Virginia

Implementation of replace rules using preference operator

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Introduction to Lexing and Parsing

LECTURE NOTES ON SETS

Finite Automata Part Three

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Part 5 Program Analysis Principles and Techniques

CMPSCI 250: Introduction to Computation. Lecture #7: Quantifiers and Languages 6 February 2012

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Languages and Compilers

CS402 - Theory of Automata FAQs By

Quiz 1: Solutions J/18.400J: Automata, Computability and Complexity. Nati Srebro, Susan Hohenberger

From Theorem 8.5, page 223, we have that the intersection of a context-free language with a regular language is context-free. Therefore, the language

Assignment 4 CSE 517: Natural Language Processing

Formal Languages and Automata

CSE 105 THEORY OF COMPUTATION

Homework & Announcements

Lexical Analysis. Finite Automata

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS103 Spring 2018 Mathematical Vocabulary

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

4.2 ALGORITHM DESIGN METHODS

FUZZY SPECIFICATION IN SOFTWARE ENGINEERING

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

(Refer Slide Time: 1:27)

1. [5 points each] True or False. If the question is currently open, write O or Open.

Multiple Choice Questions

MITOCW watch?v=yarwp7tntl4

Lecture 3: Lexical Analysis

UNIT -2 LEXICAL ANALYSIS

Algebra of Sets. Aditya Ghosh. April 6, 2018 It is recommended that while reading it, sit with a pen and a paper.

EDAA40 At home exercises 1

Transcription:

Ling/CSE 472: Introduction to Computational Linguistics 4/6/15: Morphology & FST 2

Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions

Review: FSAs and FSTs FSAs define sets of strings (regular languages). FSTs define sets of ordered pairs of strings (regular relations). Formally interesting because not all languages/relations can be defined by FSAs/FSTs. Are all finite languages and relations regular? Linguistically interesting because: FSAs have enough power for morphotactics. FSTs have (almost) enough power for morphophonology. Both are very efficient.

FSTs: Quiz Why do FSTs have complex symbols labeling the arcs? What happens if you give an FST an input on only one tape? What happens if the input has symbols outside the FST s alphabet? Do the upper and lower tape strings always have the same length?

xfst regex syntax Why is it so different from what we see in J&M and elsewhere? Why are there so many operators?

B&K minimal languages

B&K: Iteration

B&K: Concatenation

B&K: Cross-product

B&K: Composition

B&K: Closure

B&K: Universal relations

B&K: Universal relations Why are these useful?

B&K: Replacement

Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions

Sample e-insertion FST The idea is to add e only in the proper environments while letting all other sequences pass through.

Spelling change rule FST 1 define Rule1 [?* e:0 %+:0 [e i]?*]; Draw an FST corresponding to Rule1. What are the upper and lower languages of Rule1? What linguistic work is this rule supposed to do? If the upper tape has expect+ed, what goes on the lower tape?

Spelling change rule FST 2 define Rule2 [[?* e:0 %+:0 [e i]?*] [?* e %+:0 (\[e i])] [?* \e %+?*] [\[%+]*]] What are the upper and lower languages of Rule2? What linguistic work is each part of this rule supposed to do? If the upper tape has expect+ed, what goes on the lower tape? If the upper tape has write+ing, what goes on the lower tape?

Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions

Reading questions I'm confused about how "A*", the "union of A+ with the empty string language", means "the concatenation of A with itself zero or more times" (BK page 47). I understand the Kleene star notation but I don't think I understand how "zero or more of something" is meant to come out of the union of a relation with the empty string language. Maybe I'm confused about the terminology too?

Reading questions I don't understand how the universal relation works. I understand what it produces, at the least- a mapping a string to any other string- but the graph confuses me. How does the expression [?*.x.?*] relate to the graph shown in figure 2.8?

Reading questions I think I understand roughly all the different operations in the chapter, but I got pretty confused by the last part; it would help if we could go over an example of conditional replacement and parallel conditional replacement. On page 67, the authors talk about replacement ambiguity that comes from mapping upward rather than downward because of the difference between 'original strings and strings that are the result of a replacement'. Could you explain where this ambiguity comes from and why it matters if the string is the result of a previous replacement or not?

Reading questions In the text it says, "composition is inherently an operation on relations and not on languages." Does this mean that by using these some of these operators we are taking languages and creating relations? The reading also wasn't clear to me on the projection operator. Is projection then only used to denote which language, upper or lower, the automata references? I'm confused as to its usefulness. Is the reason that the expression a 0 b translates to the relation {"ab"} that there is no epsilon in networks?

Reading questions I understand the concept of concatenation, but in the reading, I am confused on why a : 0 b : a and a b : 0 are formulated as the same language/ relation: {<"ab">, <"a">}. I can sort of understand why a : 0 b : a ----> {<"ab">, <"a">}, but why doesn't a b : 0 ----> {<"a">, <"b">}? I understood figure 2.8 differently based on the definitions. My understanding from figure 2.8 was any character + cross product of a character to another character + any other character, where the first two are from equal length. In this example, I don't know why there should be 0:? and?:0 to connect strings of different length to each other. The replace relations are not necessarily one-to-one even if the replacement language contains just one string, but possible replacements may overlap, so why would not restrict the replace relations to one-to-one only to avoid this problem?

Reading questions The reading says that while [a =>.#. ~[b?*] _ ] expresses that 'a' cannot be preceded by a string that starts with 'b', [a => ~[b?*] _ ] allows 'a' to appear anywhere. I don't really understand why this is true. It seems to me like the second regular expression would instead express that 'a' cannot be preceded by a string contains 'b' in any location.

Reading questions Why are regular relations not closed in general under intersection, complementation, and subtraction. Can we try to come up with examples that illustrate this? The book tried to give an example of this using intersection on page 54, but I did not understand it.

Reading questions The any symbol "?" as the unknown symbol in network diagrams: I understand the use of "?" in regular expressions, but I do not fully understand why they overloaded the meaning of the symbol in network diagrams. Could you actually go over a network diagram with the "?" in it and explain what actually happens when you get to the "?".

Reading questions The Xerox compiler syntax for regular expressions from B&K is very different from the regular expression we have worked with. Will we be expected to study and use this syntax for future assignments? Also, are there transducer minimization algorithms in the same way there are ones for FSM minimization? How does a phrase-structure grammar dependent on center-embedding generate context-free language? What exactly are context free languages and why can't they be encoded by finite-state networks?

Reading questions The reading discusses cross products applied to two regular expressions. Is the cross product of two sets useful in NLP, or did they support the operation for more formal reasons (i.e. to support regular languages as they are formally defined, rather than in regard to their use in processing language)? How does the Xerox regular expression compiler work? The text says that it creates networks that are epsilon-free and minimal -- where can I find the algorithms that are being used to minimize them? More generally, how do you actually prove that a FSM is minimal?

Reading questions Also, if it's possible, could you please post the solution for "the better cola machine" in page 466? I really want to know how to do it.